Blar i NTNU Open på forfatter "Svendsen, Torbjørn Karl"

A Comparative Study of Deep Learning Techniques on Frame-Level Speech Data Classification

Sabzi Shahrebabaki, Abdolreza; Imran, Ali Shariq; Olfati, Negar; Svendsen, Torbjørn Karl (Journal article; Peer reviewed, 2019)

This paper provides a comprehensive analysis of the effect of speaking rate on frame classification accuracy. Different speaking rates may affect the performance of the automatic speech recognition system yielding poor ...

A Deep Learning Approach to Spoken Language Acquisition

Rugayan, Janine (Master thesis, 2021)

The process of human spoken language acquisition is still being studied up to this day—the most popular theory from B.F. Skinner describes the language learning of infants as a verbal behavior controlled by consequences. ...

Acoustic Feature Comparison for Different Speaking Rates

Sabzi Shahrebabaki, Abdolreza; Imran, Ali Shariq; Olfati, Negar; Svendsen, Torbjørn Karl (Chapter, 2018)

This paper investigates the eﬀect of speaking rate variation on the task of frame classiﬁcation. This task is indicative of the performance on phoneme and word recognition and is a ﬁrst step towards designing voice-controlled ...

Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models

Sabzi Shahrebabaki, Abdolreza; Salvi, Giampiero; Svendsen, Torbjørn Karl; Siniscalchi, Sabato Marco (Journal article; Peer reviewed, 2021)

We investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within the deep neural network (DNN) framework. In contrast with recent results in the literature, we argue ...

Better coding for learning speech representations

Rosberg, Sivert (Master thesis, 2022)

I denne oppgaven er muligheten for å bruke Non-autoregressive Predictive Coding (NPC) til å lære talerepresentasjoner undersøkt. NPC er en selv-overvåket dyp-læringsmetode som, i motsetning til andre vanlige selv-overvåkede ...

A character-based analysis of impacts of dialects on end-to-end Norwegian ASR

Parsons, Phoebe; Kvale, Knut; Svendsen, Torbjørn Karl; Salvi, Giampiero (Chapter, 2023)

We present a method for analyzing character errors for use with character-based, end-to-end ASR systems, as used herein for investigating dialectal speech. As end-to-end systems are able to produce novel spellings, there ...

Child Speech Recognition

Steinskog, Kristin Ottesen (Master thesis, 2021)

Talegjenkjenning for barn er utfordrende ettersom dagens talegjenkjenningssystem er basert på tale fra voksne. Talegjenkjenning kan hjelpe utviklingen av tale og språk hos barn. Derfor er det viktig å forbedre talegjenkj ...

Decision Algorithm for Parking Sensors

Karami, Hossein (Master thesis, 2020)

Studier viser at opptil en tredjedel av all urbane overbelastning er forårsaket av sjåfører som leter etter et sted å parkere. Amerikanske sjåfører bruker gjennomsnittlig 17 timer i året på å søke etter gratis parkeringsplasser ...

A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion

Sabzi Shahrebabaki, Abdolreza; Siniscalchi, Sabato Marco; Salvi, Giampiero; Svendsen, Torbjørn Karl (Chapter, 2021)

In this work, we investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy condition within the deep neural network (DNN) framework. We claim that DNN vector-to-vector regression for ...

Enhancement of Noisy Speech Using Deep Learning

Turøy, Ida; Mo, Kari Vikøren (Master thesis, 2021)

Abstract will be available on 2024-01-07

Enhancement of Noisy Speech Using Deep Learning

Mo, Kari Vikøren; Turøy, Ida (Master thesis, 2021)

Abstract will be available on 2024-01-11

Low-resource speech recognition - Exploring methods of improving performance

Moum, August Høyen; Winnerdal, Skjalg (Master thesis, 2020)

Å lage et nøyaktig talegjenkjenningssystem som generaliserer tilstrekkelig er ingen lett oppgave. Begrensede mengder med transkribert taledata kompliserer dette ytterligere, ettersom systemene krever store mengder treningsdata ...

Low-resource speech recognition - Exploring methods of improving performance

Moum, August Høyen; Winnerdal, Skjalg (Master thesis, 2020)

Å lage et nøyaktig talegjenkjenningssystem som generaliserer tilstrekkelig er ingen lett oppgave. Begrensede mengder med transkribert taledata kompliserer dette ytterligere, ettersom systemene krever store mengder treningsdata ...

Noise Robustness in Small-Vocabulary Speech Recognition

Haflan, Vetle (Master thesis, 2019)

Denne masteroppgaven omhandler små-vokabular talegjenkjenning, og mer spesifikt støyrobusthet i systemer designet for dette formål. Tradisjonelle og moderne gjenkjenningssystemer har blitt trent på relativt store mengder ...

Optimization of a Convolutional Neural Network for Classification of Radar Signals

Bakkene, Caroline (Master thesis, 2021)

Noveldas ultrabredbåndsradar kan oppdage menneskelig tilstedeværelse ved å detektere svært små bevegelser, slik som pusting og hjerteslag. Denne radarteknologien har mange mulige bruksområder, og blir blant annet benyttet ...

Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation

Rugayan, Janine Lizbeth Cabrera; Svendsen, Torbjørn Karl; Salvi, Giampiero (Peer reviewed; Journal article, 2023)

Semantically Meaningful Metrics for Norwegian ASR Systems

Rugayan, Janine Lizbeth Cabrera; Svendsen, Torbjørn Karl; Salvi, Giampiero (Peer reviewed; Journal article, 2022)

Evaluation metrics are important for quanitfying the performance of Automatic Speech Recognition (ASR) systems. However, the widely used word error rate (WER) captures errors at the word-level only and weighs each error ...

Sequence-to-sequence articulatory inversion through time convolution of sub-band frequency signals

Sabzi Shahrebabaki, Abdolreza; Siniscalchi, Sabato Marco; Salvi, Giampiero; Svendsen, Torbjørn Karl (Peer reviewed; Journal article, 2020)

We propose a new acoustic-to-articulatory inversion (AAI) sequence-to-sequence neural architecture, where spectral sub-bands are independently processed in time by 1-dimensional (1-D) convolutional filters of different ...

Transfer learning of articulatory information through phone information.

Sabzi Shahrebabaki, Abdolreza; Olfati, Negar; Siniscalchi, Sabato Marco; Salvi, Giampiero; Svendsen, Torbjørn Karl (Journal article; Peer reviewed, 2020)

Articulatory information has been argued to be useful for several speech tasks. However, in most practical scenarios this information is not readily available. We propose a novel transfer learning framework to obtain ...

A Two-Stage Deep Modeling Approach to Articulatory Inversion

Sabzi Shahrebabaki, Abdolreza; Olfati, Negar; Imran, Ali Shariq; Johnsen, Magne Hallstein; Siniscalchi, Sabato Marco; Svendsen, Torbjørn Karl (Chapter, 2021)

This paper proposes a two-stage deep feed-forward neural network (DNN) to tackle the acoustic-to-articulatory inversion (AAI) problem. DNNs are a viable solution for the AAI task, but the temporal continuity of the estimated ...